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ABSTRACT 

Decoding of all codons can be achieved by a sub- 
set of tRNAs. In bacteria, certain tRNA species are 
mandatory, while others are auxiliary and are vari- 
ably used. It is currently unknown how this variabil- 
ity has evolved and whether it provides an adaptive 
advantage. Here we shed light on the subset of auxil- 
iary tRNAs, using genomic data from 31 9 bacteria. By 
reconstructing the evolution of tRNAs we show that 
the auxiliary tRNAs are highly dynamic, being fre- 
quently gained and lost along the phylogenetic tree, 
with a clear dominance of loss events for most aux- 
iliary tRNA species. We reveal distinct co-gain and 
co-loss patterns for subsets of the auxiliary tRNAs, 
suggesting that they are subjected to the same selec- 
tion forces. Controlling for phylogenetic dependen- 
cies, we find that the usage of these tRNA species 
is positively correlated with GC content and may de- 
rive directly from nucleotide bias or from preference 
of Watson-Crick codon-anticodon interactions. Our 
results highlight the highly dynamic nature of these 
tRNAs and their complicated balance with codon us- 
age. 

INTRODUCTION 

tRNA is a key molecule in all cells due to its central role 
in protein translation. Each tRNA is an adaptor molecule 
that can be charged with an amino acid residue and donate 
it to an elongating peptide chain based on codon-anticodon 
recognition. Each tRNA species, distinguished by its anti- 
codon sequence, recognizes a specific set of codons, which 
encode the amino acid it is loaded with. This specificity re- 
flects the genetic code and dictates the translation of a nu- 
cleotide sequence into protein. Codon-tRNA interaction 
exhibits two types of redundancy, whereby the same codon 
can be decoded by different tRNA species (having different 
anticodons, but loaded with the same amino acid), and the 
same tRNA species can decode different codons of the same 
amino acid. This is achieved by wobble interactions (non- 



Watson-Crick base pairing) between the third codon po- 
sition and the first anticodon position. Such wobble inter- 
actions can occur due to Guanine-Uracil (G:U) base pair- 
ing, as well as many anticodon modifications that change 
codon specificity (1). Due to this redundancy, not all pos- 
sible tRNA species are required in order to decode the 61 
sense codons, and various organisms employ different sub- 
sets of tRNAs (2). 

The variability of the tRNA repertoire has been mostly 
explored in relation to tRNA abundance, reflected by gene 
copy number. It has been suggested that selection for ef- 
ficient translation leads to close correspondence between 
codon usage and the tRNA pool (3). This led to the devel- 
opment of the tRNA adaptation index (tAI) (4), which was 
used as a measure of translational selection (5-8). In addi- 
tion, translational selection was found to be associated with 
the increased number of tRNA genes (9,10). A binary view 
of an organism tRNA repertoire per se, namely whether a 
tRNA species exists or not, has been less investigated. In the 
work mentioned above, Rocha (9) found that translational 
selection is associated with the decreased number of tRNA 
species. A survey of the tRNA repertoire in representative 
organisms from all kingdoms of life led to the formulation 
of three common strategies employed to achieve reduction 
in the number of tRNA species, where by each strategy some 
of the tRNAs were spared (2). Novoa etal{\\) showed that 
archaea employ the smallest number of tRNA species, bac- 
teria prefer the use of U34N35N36 tRNAs (subscript repre- 
sents position numbering convention where anticodon po- 
sitions 34, 35 and 36 base pair with codon positions 3, 2 and 
1, respectively), in correspondence with the presence of the 
modifying enzyme tRNA-dependent uridine methyltrans- 
ferase, and eukaryotes prefer the use of A34N35N36 tRNAs, 
in correspondence with the presence of the modifying en- 
zyme tRNA-dependent adenosine deaminase. Those stud- 
ies mainly involved the common patterns of tRNA reper- 
toire at the kingdom level. However, little attention has been 
addressed to the variations in tRNA repertoire within a 
kingdom, and it is currently unknown whether these vari- 
ations are shaped to provide an adaptive advantage. 

Here we used a dataset of 319 fully sequenced genus rep- 
resentative bacteria to find patterns of tRNA usage and or- 
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ganism traits associated with them. We found that while 
certain tRNA species are always used, there is a subset of 
tRNA species that differs substantially between bacteria. 
Based on the striking absence of A34N35N36 tRNAs and 
the extended set of wobble rules, this pattern can be ex- 
plained by the need to decode all sense codons, thus sep- 
arating the tRNA species of an organism into mandatory 
and auxiliary tRNAs. The latter are especially intriguing to 
investigate, as while an organism can theoretically do with- 
out them, different organisms show different repertoires of 
these tRNAs, and it is not clear what underlies this vari- 
ability and how it evolved. We use bioinformatic methods 
employing phylogenetic information to explore the changes 
in the tRNA repertoire during evolution, with a special 
emphasis on the auxiliary tRNAs, which were found to 
be highly dynamic, frequently gained and lost along the 
phylogenetic tree. Through analysis of sequence similarity, 
we found evidence pointing at multiple pathways of tRNA 
species gain through horizontal gene transfer (HGT), anti- 
codon mutations, and a combination of both. Finally, the 
variability in the use of auxiliary tRNAs was found to be 
mostly connected to the nucleotide content of the genome, 
but also to some extent to the strength of translational se- 
lection. We discuss possible explanations for the observed 
associations between the tRNA repertoire and both the nu- 
cleotide content and translational selection. 

MATERIALS AND METHODS 

Genome data and organism datasets 

The genomic sequence and genome-related data of 1245 
bacteria was retrieved from the NCBI FTP site (ftp://ftp. 
ncbi.nih.gov/genomes/Bacteria, April 201 1). To reduce bias 
from closely related organisms, one bacterium species per 
genus was randomly selected as representative. We excluded 
endosymbiotic bacteria with severely reduced genomes (less 
than 10 6 bp) and less than 31 tRNA species. This resulted 
in a dataset of 319 bacteria. 

Species tree in Newick format was downloaded from the 
MicrobesOnline database (12) and was trimmed to the cho- 
sen dataset of 319 bacteria. This tree is based on sequence 
alignment of 78 sets of protein orthologs, encoded by a sin- 
gle copy in most bacteria. While HGT events are common 
in prokaryotes, this core of conserved proteins represents a 
central trend of vertical inheritance (13) and can be used to 
construct a reliable species tree (14). 

tRNA repertoire 

Sequences of all non-coding RNAs identified in 319 genus 
representative bacteria (as annotated by RefSeq) were 
scanned using the program tRNAscan-SE with bacteria- 
specific parameters (15). This comprised the repertoire of 
identified tRNAs and their respective anticodons. 

Association with organism traits 

Three genomic traits were considered: genome size, GC 
content and ENC'diff. Genome size and GC content were 
calculated from the genomic sequence. ENC'diff is an es- 
timate of translational selection at an organism scale cal- 
culated as the difference between the average ENC of all 



genes and the genes encoding ribosomal proteins, normal- 
ized by the average ENC of all genes. ENC is a variant of 
the effective number of codons (ENC) index (16) that ac- 
counts for background nucleotide composition (17). ENC 
takes the value of 61 when all codons are used at the fre- 
quency expected given the nucleotide composition, and its 
value decreases as codon usage deviates from the expected. 
Gene ENC was calculated based on the background nu- 
cleotide composition of the same gene. The values of the 
three genomic traits were normalized so that their mean is 
zero and their standard deviation is one. 

We performed regression analysis between the 
presence/absence profile of each tRNA species and 
the genomic traits, which takes into account the phylo- 
genetic relationships between the organisms. This was 
done using the maximum likelihood regression anal- 
ysis option of the Bay es Trait sV2.0 package (18,19) 
(http ://www. evolution . rdg . ac . uk/BayesTraits. html) . While 
designed for purely continuous data, the algorithm can 
also be used when the dependent trait is binary. For each 
presence/absence profile of a tRNA species, the algorithm 
was used to build regression models with each of the three 
genomic traits and for each pair of genomic traits, a total 
of six runs. In addition, the likelihood of the data given 
only the phylogenetic tree was evaluated by calculating the 
maximum likelihood when the regression coefficient with 
each of the genomic traits was set to zero. The param- 
eter X, which effectively scales the branch lengths of the 
phylogenetic tree, was estimated in each run in order to 
allow variation in the strength of the phylogenetic signal. 
The likelihood calculated when the regression coefficient 
was set to zero was compared to the likelihood of the 
regression with genomic trait X to estimate the statistical 
significance of the contribution of trait X. The likelihood 
of the regression with genomic trait Y was compared to the 
likelihood of the regression with both genomic trait X and 
genomic trait Y to estimate the statistical significance of 
the contribution of trait X over trait Y. The statistic used 
is twice the difference between the maximum likelihood 
values of the compared models. This statistic is distributed 
as x 2 with the number of degrees of freedom that is equal 
to the difference in the number of parameters between the 
two models, in this case — one. The association of tRNA 
presence with a genomic trait was considered statistically 
significant and independent of the other two genomic traits 
if adding it to the regression of each of the other traits 
provided statistically significant results. Since the same data 
was repeatedly analyzed, false discovery rate procedure was 
used to control for multiple comparisons based on all tests 
performed (p- value threshold is 8.5 x 10 -3 ). 

To test whether the statistically insignificant association 
of some auxiliary tRNAs with GC content stems from 
the low level of variation in their presence/absence profile, 
for each of these tRNAs we examined the percentage of 
genomes that posses it or lack it, and chose the highest of 
those as a measure of variability. We then compared this 
measure between the set of auxiliary tRNAs that were sta- 
tistically significantly associated with GC content and all 
other auxiliary tRNAs by a Mann- Whitney test. 
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Detecting patterns in tRNA repertoire 

A binary matrix representing the presence/absence of each 
tRNA species in 319 bacteria was constructed. The ma- 
trix was clustered based on Hamming distance between the 
presence/absence profiles of each two tRNA species. 

The GC content in which an auxiliary tRNA species 
tends to appear (usage shift) was denned as follows. Bacteria 
were divided into two groups based on the presence/absence 
of the tRNA, and each group was ordered by GC content. 
The GC content where the usage shift occurs was deter- 
mined as the average between the GC content of the 75 per- 
centile of bacteria in the 'absent' group and the GC content 
of the 25 percentile of bacteria in the 'present' group. This 
procedure was applied only to tRNA species that showed a 
statistically significant difference in GC content distribution 
between the 'absent' and 'present' groups using a Mann- 
Whitney test. 



Ancestral reconstruction 

The GLOOME algorithm (20-22) was used to reconstruct 
the tRNA repertoire in each internal node of the phyloge- 
netic tree as well as to estimate rates of tRNA gain and loss. 
The algorithm was applied to 52 tRNA species in all 319 
organisms simultaneously (tRNA species never observed 
in the studied organisms were excluded). The evolution- 
ary model used was 'variable gain/loss ratio (mixture)' and 
gamma rate distribution. Both of these parameters were 
chosen to allow tRNA species to vary as much as possible 
in their evolutionary model. We also used high optimization 
level. The reconstruction assigns a probability for the pres- 
ence of a tRNA species in each internal node and produces 
an estimation of gain/loss probability per branch. Nodes 
were considered as undergoing gain or loss events if the 
branch leading to them had a gain or loss probability of 
at least 0.8 (see more details in Supplementary Methods). 



Origin of new tRNA species 

Clues as to the origin of a tRNA gene are more likely to 
exist in recently acquired genes. Most recent gain events 
(MRGEs) were defined as gain events that have no other 
gain of the same tRNA species below them in the phylo- 
genetic tree. For a MRGE of tRNA-X, all the genes de- 
scending from the gain were compared against a database 
of tRNA genes from all tRNA species and organisms us- 
ing BLAST. The BLAST score of a tRNA against itself was 
used to evaluate the quality of the match. Only hits that 
scored at least 80% of the self-hit score were further consid- 
ered. Hits were classified as 'vertical inheritance' if they had 
the same anticodon and descended from the same MRGE 
as the query, or otherwise as 'non-vertical inheritance'. For 
each query, the best-scored hits were considered in the ver- 
tical inheritance category and in the non-vertical inheri- 
tance category. The results of the ancestral reconstruction, 
MRGE analysis and the mapping of potential tRNA origins 
were uploaded to the interactive tree of life (iTOL) (23,24) 
for visualization in tree context. 



tRNA co-gain and co-loss and operon structure 

Operon predictions were retrieved from the MicrobesOnline 
database (25). Organisms originating from multiple gain 
events were suspected to acquire the tRNA genes through 
horizontal operon transfer. For each two tRNA species that 
had at least one event of co-gain, the number of organisms 
descending from these events that contain an operon with 
genes of both tRNA species was recorded. The same was 
done for organisms not descending from a co-gain event 
to provide a background of operon co-existence of the two 
tRNA species. 

Nodes where multiple loss events were predicted were sus- 
pected to have lost the tRNA genes through operon dele- 
tion. We looked for evidence for gene proximity in organ- 
isms evolutionarily closest to the co-loss event. Organisms 
were considered nearest neighbors if they descended from 
the parent node of the co-loss event but not from the co- 
loss event itself. For each two tRNA species that had at least 
one event of co-loss, the number of nearest neighbors that 
contain an operon with genes of both tRNA species was 
recorded. The same was done for organisms not considered 
nearest neighbors to provide a background of operon co- 
existence of the two tRNA species. 

RESULTS 

Patterns of tRNA usage across species 

We scanned 319 bacterial genomes to identify their tRNA 
genes (annotated as tRNA species by the anticodon se- 
quence) (Figure 1A). Pseudo tRNAs were excluded. As 
has been previously reported (2,1 1) most A34N35N36 tRNA 
species (except for A34C35G36 tRNA encoding Arginine) 
were found to be missing from all or the vast majority of 
organisms. It is known that Adenosine in the anticodon 
wobble position (position 34) is very efficiently modified 
to Inosine (I), which recognizes codons ending with C, U 
and A. This broad specificity has detrimental effects when 
a codon quartet is split into two pairs of codons (purine- 
and pyrimidine-ending) encoding different amino acids, as 
it leads to ambiguities when decoding the A-ending codon. 
However, when all codons in the quartet encode the same 
amino acid, this broad specificity is not obviously harm- 
ful, as can be seen by the use of such tRNAs in eukary- 
otes (2). Therefore, while it is not clear yet what underlies 
the widespread avoidance of A34N35N36 tRNA species in 
prokaryotes, it seems that there is strong negative selection 
against them. 

The absence of most A34N35N36 tRNAs can explain 
several observed patterns of tRNA usage. Specifically, 
U34N35N36 and G34N35N36 tRNA species are present in all, 
or the vast majority of organisms, since without A34N35N36 
tRNAs, both are necessary for decoding the N1N2A3 and 
NiN2[CU]3 codons, respectively (see the allowed codon- 
anticodon interactions in Figure IB). We therefore refer 
to these tRNA species as mandatory tRNAs. Two other 
mandatory tRNAs are C34A35U36 and C34C35A36, as they 
are the sole decoders of the Met and Trp codons. Since 
the always-present U34N35N36 tRNAs are capable of decod- 
ing N1N2G3 codons (Figure IB), the use of C34N35N36 tR- 
NAs is auxiliary rather than mandatory. Indeed, we found 
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Figure 1. The tRNA repertoire is highly variable. (A) The number of organisms (out of 319) that have at least one copy of a tRNA species. The numbers 
above the bars represent the number of codons in the relevant codon quartet that code for the amino acid decoded by the tRNA. (B) A schematic represen- 
tation of known codon-anticodon interactions. Black solid arrows represent commonly occurring interactions. Gray arrows represent possible interactions 
when Adenosine is modified to Inosine in the first anticodon position. Black dashed arrows represent modifications-dependent interactions when the first 
anticodon position contains Uridine. The thickness of the dashed arrows represents interaction efficiency. 



that the rest of the C34N35N36 tRNA species, which decode 
amino acids with several synonymous codons, show usage 
variability. Notice that the eight G34N35N36 tRNAs decod- 
ing 4-fold degenerate amino acids are probably not as essen- 
tial as those decoding 2- and 3-fold degenerate amino acids 
since they can be supplemented or even replaced by the cor- 
responding U34N35N36 tRNA under certain conditions (as 
discussed below, Figure IB). Indeed these G34N35N36 tR- 
NAs demonstrate greater usage variability and are there- 
fore considered auxiliary tRNAs as well. There are two ma- 
jor exceptions to the pattern described above. He tRNA 
U34A35U36 is not mandatory but rather rarely seen, since in 
order to prevent misreading of the Met codon A1U2G3 as 
He, a specially modified tRNA (not considered in this work) 
exists instead of the conventional U34A35U36 (2). Also, the 
Arg N34C35G36 tRNA species shows different behavior due 
to the use of A34C35G36 tRNA by most bacteria. In the pres- 
ence of A34C35G36 (modified to I34C35G36), G34C35G36 tR- 
NAs becomes hardly used and auxiliary (Figure 1 A and B). 
Since I34C35G36 does not decode the QG2G3 codon, the 
presence of either U34C35G36 or C34C35G36 is mandatory, 
but they can be viewed as auxiliary to one another. 

While the apparent selection against A34N35N36 tRNAs 
combined with codon-anticodon binding possibilities de- 
termines the mandatory and auxiliary tRNAs, an intrigu- 
ing question regards the pattern of usage variability of the 



auxiliary tRNAs in the various genomes and the principles 
underlying it. To try and answer this question, we looked for 
associations between the use of each auxiliary tRNA species 
and several genomic traits. The genomic traits selected rep- 
resent three aspects hypothesized to influence the tRNA 
repertoire: genome size, nucleotide content and transla- 
tional selection. Genome size was shown to be correlated 
with tRNA gene number (4) and may therefore correlate 
with tRNA presence as well. The nucleotide content of the 
genome has a tremendous effect on codon usage, mainly 
through the third codon position (26,27). Co-evolution of 
codon usage and the tRNA pool may therefore be reflected 
as association between the nucleotide content measured by 
the GC content of the organism and the tRNA pool. The 
third genome property is the strength of selection for effi- 
cient translation, shown by Rocha to correlate with the to- 
tal number of tRNA genes and to inversely correlate with 
the total number of tRNA species (9). We used ENC'diff 
as a measure of translational selection, calculated similarly 
to the measure proposed by Rocha in his work (see de- 
tailed description in Materials and Methods and in (28)). 
Briefly, ENC'diff is the normalized difference between the 
average ENC of all genes and the average ENC of riboso- 
mal genes, where ENC is a measure of gene codon bias. The 
stronger the selection towards efficient translation in highly 
expressed genes, the higher the codon bias in the highly ex- 
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pressed ribosomal genes compared to all other genes, result- 
ing in higher ENC'difr- 

We evaluated the association between tRNA repertoire 
and these traits by a regression model. The generally hier- 
archical evolution of species implies that the phylogenetic 
signal should influence to a certain extent the observed 
phenotypes (29). In other words, closely related species 
are expected to have more similar characteristics than re- 
motely related organisms simply due to differences in di- 
vergence time, which may lead to biased estimates of as- 
sociation between genomic traits (29). It is therefore essen- 
tial, when looking for associations between organism traits, 
to take into consideration the phylogenetic relatedness be- 
tween genomes. To this end, we performed regression anal- 
ysis using a phylogenetic generalized least-square approach 
implemented in the BayesTraits package (18,19). As sug- 
gested by Ives and Garland (30), the values of the con- 
tinuous genomic traits were normalized so that the calcu- 
lated regression coefficients will represent effect sizes. Test- 
ing the correlation between each two continuous genomic 
traits while taking into account the phylogenetic tree (down- 
loaded from the MicrobesOnline database (12)) (Supple- 
mentary Methods) revealed a statistically significant corre- 
lation between genome size and GC content (R = 0.305, 
^-value = 2.2 x 10~ 8 ) and very weak correlations between 
ENC'difr and both genome size and GC content (7^ = 0.118, 
/?-value = 3.8 x 10~ 2 and 7? = -0.1 76, /rvalue = 1.6 x 10~ 3 , 
respectively) 

For each tRNA species, the relationship between its bi- 
nary presence/absence profile and the normalized values of 
each of the genomic traits was analyzed, taking into account 
the phylogenetic dependencies. We allowed the algorithm to 
simultaneously estimate the parameter X, which is a mea- 
sure of the phylogenetic signal. X values can range between 0 
(no phylogenetic signal) and 1 (phylogeny perfectly explains 
variation). The values calculated for this parameter were 
often close to one (Supplementary Table SI), indicating a 
strong phylogenetic signal. It is therefore remarkable that 
the variation remaining after controlling for the strong phy- 
logenetic signal is often found to be statistically significantly 
associated with additional genomic traits, as detailed below. 
Since the continuous genomic traits are not independent, we 
considered an association of the dependent variable (tRNA 
repertoire) with trait X to be independent of trait Y if the 
likelihood of the regression model with both X and Y traits 
was statistically significantly higher than the likelihood of 
the regression model with the Y trait alone (Materials and 
Methods). As can be seen in Figure 2, genome size is not 
independently associated with the presence/absence status 
of any tRNA species except for the rarely used U34A35U36- 
Ile. GC content, on the other hand, is positively (and inde- 
pendently from the genome size and ENC'difr traits) associ- 
ated with most of the auxiliary tRNAs, explaining between 
3 and 34% of the usage variability as measured by R 2 . Of 
note, the auxiliary tRNA species that were not statistically 
significantly associated with GC content were more uni- 
formly present or absent in bacteria (Mann-Whitney test, 
p-value = 1.3 x 10~ 3 , see Materials and Methods). Almost 
all the auxiliary tRNAs exhibit a negative slope that re- 
flects a negative correlation with ENC'dirr, although a sta- 
tistically significant and independent negative association 



with ENC'dirr was observed for only three auxiliary tRNA 
species (C 3 4G35A 36 -Ser, C 3 4G3 5 G36-Pro, C 3 4G3 5 U36-Thr). 
This finding is in agreement with Rocha's discovery that the 
total number of tRNA species decreases as translational se- 
lection increases (9). 

We next turned to examine whether the auxiliary tRNAs 
usage evolved randomly or in a certain determined order. 
To address this, we presented the data as a matrix of tRNA 
species over organisms, where each matrix cell contains in- 
formation about the presence/absence of tRNA X in organ- 
ism Y. We clustered this matrix on the tRNA dimension us- 
ing Hamming distance, which measures the number of dif- 
ferences between the profiles of each pair of tRNA species 
(Figure 3 A). Organisms were arranged by their GC content 
(Figure 3B) as this property was found to be dominant in its 
relation to the tRNA profile. As can be seen in Figure 3A, 
there appears to be an order of auxiliary tRNA gain or loss. 
G34N35N36 and C34N35N36 tRNAs form separate clusters, 
with the exception of C34A 35 A3 6 -Leu and C34C3 5 U36-Arg, 
which are included in the G34N35N36-cluster. In general, the 
usage shift (Materials and Methods) of G34N35N36 tRNAs 
occurs in consistently lower GC-content values than that of 
their C34N35N36 counterparts (Wilcoxon signed rank test p- 
value = 0.0156). 

The codon quartet of Arg (tRNA N34C35G36) is an inter- 
esting special case as it is the only case where the A34N35N36 
tRNA is regularly used (Figure 3C and D) Since A34C35G36 
modified to I34C35G36 does not decode the codon QG2G3 
(see Figure IB), an additional tRNA is required. In most 
bacteria (239) this is achieved by C34C35G36. However in 24 
bacteria the additional tRNA is U34C35G36, and in 26 bac- 
teria both tRNAs are employed. Of note, C34C35G36 tRNA 
is used when GC content is high and U34C35G36 when it 
is low. Furthermore, the rare event of supplementing the 
A34C35G36 tRNA with G34C35G36 tRNA occurs only when 
GC content is high. In the infrequent absence of A34C35G36 
tRNA, both G34C35G36 and U34C35G36 are required to de- 
code the quartet. In nine out of 19 cases, the C34C35G36 
tRNA is also found but only when GC content is high. It 
would appear that even in this more complicated case of 
tRNA combinatorics, the rule governing the use of auxil- 
iary tRNAs is that G34N35N36 and C34N35N36 are enlisted 
when GC content is high and the non-essential U34N35N36 
is enlisted when GC content is low. 

tRNA dynamics is dominated by gene loss 

To better understand the dynamic processes that shape 
tRNA usage, an evolutionary reconstruction is essential. 
We used the GLOOME software (20-22) to combine tRNA 
presence/absence profile with the phylogenetic tree and thus 
reconstructed the probable evolutionary history of each 
tRNA species. GLOOME implements a stochastic mapping 
approach to assign gain and loss events onto each branch 
of a phylogenetic tree based on the topology and branch 
lengths of the tree. The algorithm was used to infer branch- 
specific and tRNA-specific gain and loss events using an 
evolutionary model that permits the gain/loss ratio to vary 
among tRNAs, thereby allowing for different evolutionary 
behaviors for the various tRNAs. While the expectation 
of gain/loss per branch depends on its length, the overall 
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Figure 2. Auxiliary tRNA presence/absence profile is mostly associated with GC content. Regression analysis of tRNA presence/absence profile with 
genome size (top), GC content (middle) and ENC'diff (bottom). RNA species that exhibited little presence/absence variability (when three or less organisms 
had a different presence/absence status than the majority of organisms) are not presented. In parentheses are the number of codons in the relevant codon 
quartet that code for the amino acid decoded by the tRNA and the number of organisms in which the tRNA species was found (out of 319). R 2 is the 
amount of variation explained by the regression model (note the different R 2 range). Asterisks denote statistically independent genomic traits that when 
added to regression between the dependent variable (tRNA presence/absence) and each of the additional genomic traits, statistically significantly improved 
the fit of the data to the calculated regression model. 



tRNA gain/loss rates are the sum of gain/loss expectations 
predicted per branch over all the branches of the tree and 
are therefore comparable (Figure 4). As expected, manda- 
tory tRNAs have very low loss and gain rates. A notable ex- 
ception is U34A3 5 A 36 -Leu where a scenario of multiple loss 
events that occurred late in evolution and a scenario of few 
early losses followed by multiple gain events both explain 
the tRNA pattern and are therefore summed together to 
produce high gain and loss rates. Compared to mandatory 
tRNAs, auxiliary tRNAs tend to have much higher gain and 
loss rates. While the rates are highly varied, the dominant 
trend is of tRNA loss. Only four auxiliary tRNAs are gained 
at a higher rate than they are lost. These include the rarely 
used Arg tRNAs G34C35G36 and U34C35G36 that are gained 
as a replacement when the commonly used A34C35G36 and 
C34C35G36 are lost, and the highly dynamic C34U35C36-GIU 
and C34U3 5 G36-Gln. GLOOME calculates a probabilistic 
reconstruction of ancestral states (see example in Supple- 
mentary Figure SI). While there is a distinct signal of verti- 
cal inheritance demonstrated by the clusters of related bac- 



teria, all either possessing or missing a specific tRNA, it 
is also obvious that multiple gain and loss events have oc- 
curred repeatedly and independently throughout evolution 
for most auxiliary tRNAs. The ancestral reconstruction of 
all auxiliary tRNA species suggests with high probability 
that they were present at the bacterial ancestor (Supplemen- 
tary Methods and Supplementary Table S2), as occurs of- 
ten in extant GC-rich bacteria. This may point to a GC- 
rich ancestor or to an underlying environmental influence in 
which the extensive tRNA set promoted survival. Further- 
more, this implies that tRNA species with low gain rate like 
C34A35C36-Val and C34G35C36-Ala once lost are not gained 
again, whereas for tRNA species with higher gain rate, the 
loss is not as final. 

Identification of gain and loss events makes it also possi- 
ble to assess the dependencies between the different tRNA 
species. We used random simulations (Supplementary Ma- 
terials and Methods) to generate a background distribution 
of independently gained and lost tRNA species, to which 
the actual distribution of gain and loss events was com- 
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Figure 3. GC content effect on auxiliary tRNA species is variable. (A) Clustering of the presence/absence (black/white, respectively) profiles of auxiliary 
tRNA species in 319 bacteria. tRNA species are clustered based on Hamming distance while bacteria are arranged by their GC content presented in (B). 
The GC content where a tRNA species tends to appear/disappear (usage shift) was defined as the mean between the GC content of the 75 percentile of 
organisms missing the tRNA and the 25 percentile of organisms that possess the tRNA. Asterisks mark the usage shift location based on B. In parentheses 
are the number of codons in the relevant codon quartet that code for the amino acid decoded by the tRNA, and the usage shift value, separated by a comma. 
(C) The same as A for the four Arg tRNA species of the type N34C35G36. (D) Distribution of Arg tRNA species combinations among the genomes. 



pared. This revealed that the observed distribution is statis- 
tically significantly biased towards nodes originating from 
multiple gain or loss events (Supplementary Figure S2), 
pointing to strong dependencies between tRNA species. 
Analysis of the individual dependencies between each pair 
of tRNA species (Supplementary Figure S3) revealed that 
C34U35C36-GIU and C34U35G36-Gln are co-gained in 10 



nodes out of a maximum of 14 (such a result or higher was 
never reproduced in 10 4 random simulations, Supplemen- 
tary Methods). C34C35C36-GIV, C3 4 G35A 36 -Ser, C34G35G36- 
Pro and C34G35U36-Thr also tended to be co-gained with 
each other more than expected at random. Statistically sig- 
nificant co-loss is found in almost all C34N35N36 auxiliary 
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Figure 4. Auxiliary tRNA evolution is highly dynamic and dominated by gene loss. Sum of gain (blue) and loss (red) branch probabilities over the phylo- 
genetic tree calculated by GLOOME. tRNA species are listed alphabetically by their anticodon sequence. Light gray dots mark auxiliary tRNA species. 
Dark grey dots mark Arg auxiliary tRNA species. 



tRNAs and to a lesser extent in G34N35N36 auxiliary tRNAs 
(mainly in G34G3sC36-Ala and G34G35G36-Pro). 

A detailed examination of sequential gain and loss 
events (Supplementary Methods and Supplementary Fig- 
ure S4) revealed that there are also certain tRNA species 
that appeared in a specific order whenever they are prox- 
imal in the phylogenetic tree. Remarkably, these tRNA 
species (i.e. C34G35G36-Pro, C34G35U36-Thr, C34C35C36- 
Gly, C34G3 5 A 36 -Ser and C34U3 5 G36-Gln) are the same 
tRNA species that tended to be co-gained. While the results 
were not statistically significant due to the small number of 
instances of each type, they form a coherent hierarchy of 
tRNA gain. Loss event order was less clear and loss hierar- 
chy could not be established. It is possible that the selective 
pressure on tRNA loss is stronger than the selective pressure 
on tRNA gain or that tRNA loss is more easily achieved 
than tRNA gain. This can lead to rapid gene eliminations 
that appear simultaneously. 

tRNA gain through horizontal gene transfer and anticodon 
mutations 

tRNA dynamics, while dominated by gene loss, also in- 
cludes gain events. An interesting question regards the 
source of new tRNA genes in a genome. Possible mecha- 
nisms are HGT and anticodon mutations of already present 
tRNA genes. Tracking gene origin is difficult due to genetic 
variation that accumulates with time and erases sequence- 
embedded evidence. tRNA genes are especially problematic 
since in such short sequences every mutation eliminates a 
significant evolutionary signal. Furthermore, since tRNA 
function largely depends on its tertiary structure, structure- 
preserving mutations are more easily accumulated. It was 
shown in bacteria that genes of the same tRNA species 
are conserved in only 60-85% of their sequence (31). How- 
ever, since sequence similarity rapidly deteriorates, the find- 
ing of highly similar sequences probably hints at a com- 
mon ancestor that was shared not long ago. In order to 
find such events, we identified the MRGEs of each tRNA 



species (see schematic example in Figure 5A), and all the 
tRNA genes that descended vertically from them, and ran a 
BLAST search of these genes against a database containing 
all tRNA genes from the 319 bacteria. Since only strong se- 
quence resemblance is likely to suggest a shared origin, the 
BLAST score cutoff was set at 80% of BLAST score of the 
query against itself (maximal score), which roughly corre- 
sponds to 3^ mismatches or 1-2 gaps for a typical tRNA 
gene. Importantly, this procedure is aimed at identifying 
with high confidence the origin of new tRNA species. It does 
not provide a comprehensive inventory of HGT of tRNA 
genes or anticodon mutations as it misses events where 
gene copy number is changed (not considered as MRGE) 
or multiple mutations have accumulated (old events that do 
not pass the strict threshold). Sequence similarity was at- 
tributed to vertical inheritance if the query and hit tRNA 
genes had the same anticodon and descended from the same 
MRGE (Figure 5B) and to non-vertical inheritance other- 
wise. For each query, the highest scored hits in each of these 
two categories were considered. Since a query could show 
high similarity both to tRNAs belonging to the same clade 
(explained by vertical inheritance) and to tRNAs of distant 
organisms (suggesting HGT), this procedure allowed the 
detection of the source of a horizontally transferred tRNA 
to an ancestor of a bacterial clade. 

Out of 403 BLAST hits that passed the threshold, the 
vast majority (345) was explained by vertical inheritance. 
We therefore concentrated on the other 58 results. In 53 out 
of 58 non-vertical inheritance results the query and hit had 
the same anticodon, but did not descend from the same 
MRGE event (Figure 5C), which suggests a HGT event. 
Twenty-two of them are presented in Figure 6A and ap- 
pear to represent a single HGT event. In this example, an 
MRGE of C 3 4G35U36-Thr tRNA was found in a clade be- 
longing to the Enterobacteriaceae family. This gain event 
appears to be relatively recent since 1 1 of the organisms in 
the clade (out of 13 that did not lose the C34G35U36-Thr 
gene) also show a strong vertical inheritance signal, which 
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Figure 5. tRNA sequence similarity may reveal the origin of newly acquired tRNA genes. (A) A schematic example of tRNA-X ancestral reconstruction. 
The shade of the blue dots indicates the probability of tRNA-X presence (zero probability is not presented, low probabilities appear white). Red dots 
mark nodes of MRGEs, where tRNA-X gain is predicted with probability of at least 0.8, and no additional gain event of tRNA-X is found below it in the 
tree. Yellow dots mark all the currently extant organisms descending from an ancestor where an MRGE occurred. The grey lines connect two organisms 
demonstrating high tRNA gene sequence similarity according to BLAST. (B-F) Mechanisms most likely to explain tRNA gene similarity: (B) Vertical 
gene transfer can explain similarity of two tRNA-X genes (yellow curly lines) descending from the same MRGE. (C) Horizontal gene transfer can explain 
similarity of two tRNA-X genes descending from different MRGEs. (D) Horizontal gene transfer of tRNA-Y (magenta curly line) followed by mutations 
transforming tRNA-Y to tRNA-X can explain similarity of tRNA-X and tRNA-Y genes from distantly related organisms. (E) Similar to D but in closely 
related organisms. Due to vertical gene transfer, closely related organisms have a copy of tRNA-Y. A mutation that transforms tRNA-Y to tRNA-X in one 
organism creates very similar orthologs of different tRNA species. (F) Gene duplication followed by anticodon mutation creates very similar paralogous 
genes that encode different tRNA species. 



indicates they had little time to diverge. These tRNA genes 
are very similar to two C34G3 5 U36-Thr genes of the Neis- 
seriaceae family (thus creating the 22 non-vertical BLAST 
hits), suggesting that C34G3sU36-Thr tRNA gene was hori- 
zontally transferred from an organism in the Neisseriaceae 
family to the Enterobacteriaceae ancestor. This explanation 
is further supported by the fact that organisms in both fam- 
ilies are host-associated (most are human pathogens) pro- 
viding ample opportunities for sharing genetic material. 

The second gain mechanism mentioned above is through 
mutation in the anticodon of an already present tRNA gene 
that transforms it from one tRNA species to another. There 
are three possible scenarios involving anticodon mutation: 
(i) tRNA gene is horizontally transferred and then under- 
goes anticodon mutation (Figure 5D), resulting in distantly 
related organisms having very similar tRNA genes that dif- 
fer in their anticodon. (ii) A mutation occurs in a tRNA 
gene, transforming it to a different tRNA species (Figure 
5E), resulting in closely related organisms having very sim- 
ilar orthologs that differ in their anticodon. (iii) A muta- 
tion occurs in a tRNA gene after a duplication event oc- 



curred (Figure 5F), resulting in very similar paralogs that 
differ in their anticodon. We identified two instances of the 
first scenario and three instances of the third scenario. Fig- 
ure 6B and C show representative cases of each scenario. In 
Figure 6B, an MRGE of U 34 C35G36-Arg tRNA was identi- 
fied in Natranaerobius thermophilus (taxonomy ID: 457570) 
and showed the highest similarity to the A34C3sG36-Arg 
tRNA gene of Carboxydothermus hydrogenoformans (tax- 
onomy ID: 246194), which belongs to an entirely different 
order and is only distantly related. The evolutionary dis- 
tance points to a HGT and mutation rather than just a mu- 
tation. Interestingly, the presumed gene donor is an anaer- 
obic hyperthermophile isolated from a hot swamp whereas 
the gene acceptor is an anaerobic thermophile isolated from 
a solar-heated lake. While not sharing similar tastes in pH 
and salinity, it is possible that at one time these organisms, 
their ancestors or close relatives shared the same environ- 
ment and exchanged genes. The MRGE of C34C35C36-Gly 
tRNA in Capnocytophaga ochracea (taxonomy ID: 521097, 
Figure 6C) was found to be most similar to the G34C35C36- 
Gly tRNA gene of the same organism. This suggests that 
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Figure 6. tRNA genes are acquired by multiple pathways. (A-C) Similarities between tRNA genes can explain how tRNA genes are acquired. Tree vi- 
sualization is as described for Figure 5A. Arrows connect organisms that have similar tRNA genes. The arrows are black if the similarity is explained 
by vertical inheritance and green if by non-vertical inheritance. Clades not involved in gene transfer are collapsed (black triangles). (A) C34G35U36-Thr 
tRNA genes in the Enterobacteriaceae family show similarity within the family and are similar to C34G35U36-Thr in the Neisseriaceae family, suggesting 
horizontal gene transfer of C34G35U36-Thr from the Neisseriaceae family to the Enterobacteriaceae family, followed by vertical gene spread. (3, 7, E and 
N mark the clades of the Betaproteobacteria and Gammaproteobacteria classes and the Enterobacteriaceae and Neisseriaceae families, respectively. (B) The 
A34C35G36-Arg tRNA gene in Carboxydothermus hydrogenoformans (taxonomy ID: 246194) is the most similar gene to the new U34C35G36-Arg tRNA in 
Natranaerobius thermophilus (taxonomy ID: 457570), suggesting horizontal gene transfer followed by an A to T mutation in the first anticodon position. 
(C) G34C35C36-Gly tRNA gene in Capnocytophaga ochracea (taxonomy ID: 521097) is the most similar gene to the new C34C35C36-Gly tRNA of the 
same organism, suggesting that the G34C35C36-Gly tRNA gene underwent duplication followed by a G to C mutation in the first anticodon position. (D) 
Alignment of the two tRNA genes discussed in C. The duplication/mutation theory is supported by similarities in the sequences flanking the genes. The 
anticodons of both genes are marked in red. 



the original G34C35C36-Gly gene was duplicated and then 
mutated from G to C in the first anticodon position. This 
explanation is supported by sequence similarity mostly up- 
stream to the gene, that was not found with any other tRNA 
gene, as would be expected from gene duplication (Figure 
6D). 

These results demonstrate that tRNA genes can be gained 
through HGT, anticodon mutations or a combination of 
both. The fact that we did not find any cases of anticodon 
mutation without gene duplication (Figure 5E) may suggest 
that these genes are added to, rather than replace, existing 
tRNA genes, thereby reinforcing the accessory role of the 
gained tRNAs. Remarkably, in all five cases where an an- 
ticodon mutation was thought to explain tRNA gain, the 



original and the new tRNA genes always decoded the same 
amino acid (U34C35C 36 -Gly/C34C35C36-Gly, G 3 4C35C 36 - 
Gly/C 3 4C3 5 C36-Gly, U34U35G36-Gln/C34U3 5 G36-Gln and 
twice A34C35G36-Arg/T34C35G36-Arg). This is not surpris- 
ing considering that the original genes are built to be recog- 
nized by the correct aminoacyl tRNA synthetase, and few if 
any additional modifications are required to achieve a func- 
tional tRNA gene with altered codon specificity. 

Operon architecture effect is limited to C34U35C36-GI11 and 
C34U35G36-GI11 co-gain 

Co-gain or co-loss of tRNA species could occur due to 
the physical proximity of their encoding genes. It is widely 
acknowledged that tRNA genes tend to cluster in oper- 
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ons, often accompanied by rRNA genes (32-36). We set 
out to examine if multiple events of tRNA gene gain or 
loss can be explained by operon gain or loss. The operon 
data, downloaded from MicrobesOnline, covered at least 
60% of the tRNA genes of most organisms (average of 
84%, Supplementary Figure S5A). The low coverage ob- 
served in 20 organisms is due to genome version differ- 
ences that prevented tRNA gene mapping to operons, but 
is not expected to bias the results. On average, 23.8% of 
tRNA genes were found in an operon with other tRNA 
genes (Supplementary Figure S5B). This is lower than ex- 
pected based on several well-annotated organisms {Bacil- 
lus subtilis 168 - 90.7%, Escherichia coli K12 MG1655 - 
69.8%, Listeria monocytogenes EGD-e - 88.1%, Mycobac- 
terium leprae TN - 35.6%, Neisseria meningitidis MC58 - 
69.5%, Pseudomonas aeruginosa PAOl - 58.7%, Strepto- 
myces coelicolor A3 (2) - 29.2%; data taken from (37,38) 
and is probably an underestimation due to the high percent 
of genes that have no operon annotation. In the event of 
tRNA co-gain driven by a horizontally transferred operon 
or part of an operon, the operon structure is expected to 
persist in descending organisms until blurred by gene gains, 
losses and translocations. We therefore counted for each two 
tRNA species, the number of organisms descending from 
a co-gain event in which genes of both tRNA species were 
found in the same operon (Table 1). The only tRNA pair for 
which evidence for a co-gain event through operon transfer 
was found is C34U3 5 C36-Glu/C34U3 5 G36-Gln. Out of 12 or- 
ganisms descending from C34U35C36-Glu/C34U35G36-Gln 
co-gain nodes in which both tRNA species were present 
and had operon information, in seven organisms both 
tRNA genes were found in the same operon. All the iden- 
tified operons contain only the two co-gained genes and 
in all of them the first gene is C34U35G36-Gln followed by 
C34U35C36-GIU. The operon sequences were aligned and the 
sequence differences were used to construct an unrooted se- 
quence similarity tree (Supplementary Figure S6). Operons 
descending from a co-gain event showed higher similarity 
among themselves as would be expected from vertical inher- 
itance following horizontal operon transfer. Interestingly, in 
Acidaminococcus fermentans DSM 20731 (Taxonomy ID: 
591001) a perfect duplicate of the C34U35G36-Gln gene pre- 
cedes the operon. It is possible that gaining the operon did 
not satisfy the need of the organism, and a duplication of 
C34U35G36-Gln was also required. 

A similar reasoning was applied to co-loss events, where 
evidence for operon structure was expected to be strongest 
in organisms closest to the co-loss event. The organisms 
considered closest are those descending from the parent of 
the co-loss branch but not from the co-loss branch itself, 
where unless a gain event has occurred, the tRNA species 
should be missing (Supplementary Table S3). In seven pairs 
of tRNA species, each representing only one event of co- 
loss, at least one of the close organisms had an operon with 
genes of both tRNA species. However, in all these instances, 
the co-loss event is accompanied by two to 11 additional 
tRNA losses that have no evidence for sharing the same 
operon (details can be found in the text accompanying Sup- 
plementary Table S3). It therefore appears that strong selec- 
tion operating in parallel to eliminate tRNA genes explains 
the multiple losses better than operon loss. 



DISCUSSION 

Analysis of tRNA usage has revealed a subset of tRNA 
species that we termed auxiliary, since they appear to sup- 
plement mandatory tRNA species in some but not all or- 
ganisms. One of our main aims in this work has been to 
define and investigate the variability in the use of auxiliary 
tRNA species in the bacterial world. 

Selection for translational efficiency through tRNA reuse 

When examining the changes in tRNA repertoire in regard 
to ENC'difr, a measure of translational selection, we found 
that the presence of auxiliary tRNA species is weakly nega- 
tively associated with ENC'difr (Figure 2 and Supplemen- 
tary Table SI). While the results were statistically signifi- 
cant independently of both GC content and genome size 
in only three of the tRNA species, the same trend was ob- 
served for most auxiliary tRNAs. These findings are in ac- 
cord with Rocha's conclusion that translational efficiency is 
associated with economy in the number of tRNA species ac- 
companied by an increase in their gene copy number (9). We 
propose that the association of ENC'difr with the reduced 
number of tRNA species may derive from a more efficient 
recycling of previously used tRNAs. It was shown for Sac- 
charomyces cerevisiae (39), and multiple bacteria (40) and 
archaea (41) that successive occurrences of the same amino 
acid favor codons that use the same tRNA. This suggests 
that reloading a tRNA that has just donated its amino acid 
to the elongating protein with a new amino acid is more ef- 
ficient than waiting for a preloaded tRNA to diffuse to the 
translation site. Thus, if one tRNA species is the sole de- 
coder of a set of codons, it does not matter which codon 
is used when the same amino acid is encoded next because 
all codons will be able to reuse that tRNA species. However, 
when more than one tRNA species is available, the situation 
is suboptimal. For example, the U34N35N36 tRNA decoding 
N1N2A3 or N1N2G3 codons can be reused to decode down- 
stream codons of both types, while the auxiliary C34N35N36 
tRNA can only be reused to decode downstream N1N2G3 
codons, thus reducing translation efficiency. In the case of 4- 
fold degenerate amino acid decoding, maximal tRNA reuse 
is achieved when U34N35N36 tRNA is the sole decoder of 
the entire codon quartet. This type of superwobbling was 
shown, however, to be less efficient for the U34:C3 uncon- 
ventional wobble interaction (42-44) and is therefore ex- 
pected to be more prevalent when GC content is low, as was 
indeed observed (2,45,46). As shown in Figure 3, our anal- 
ysis revealed as well that C34N35N36 and G34N35N36 tRNA 
species tend to be both absent only when GC content is low. 

The association between the tRNA repertoire and GC content 
can be explained by selection for Watson-Crick interactions 
or by nucleotide bias 

As GC content decreases, the use of most auxiliary tRNA 
species decreases. A preference of Watson-Crick over wob- 
ble interactions may provide an explanation for these re- 
sults. While the N1N2U3 codon is always decoded using 
wobble (either by the commonly used G34N35N36 or by 
the more rarely used U34N35N36 and I34N35N36 tRNA 



Nucleic Acids Research, 2014, Vol 42, No. 10 6563 



Table 1. C34U35C36-GIU/C34U35G36-GI11 tRNA co-gain is associated with operon structure 
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a The cartoon is a simplified example aimed at demonstrating the considered nodes. If two tRNAs were gained by horizontal operon transfer, it is more 
likely that the operon structure will be preserved in organisms descending from the co-gain event. 

b The cartoon is a simplified example aimed at demonstrating the considered nodes. Nodes not descending from a co-gain event are not suspected to have 

gone operon transfer and therefore provide a background estimate of finding the two tRNAs in the same operon. 

c The number of nodes depicted by colorful dots in the cartoon. 

d The number of nodes in which at least one tRNA gene of each species is present. 

e The number of nodes in which at least one tRNA gene of each species has operon data. 

f The number of nodes in which there is at least one operon that contains at least one tRNA gene of each species. 



species) and the N1N2A3 codon is almost always de- 
coded using a Watson-Crick interaction (by the U34N35N36 
tRNA species), the N1N2C3 and N1N2G3 codons can ei- 
ther form a Watson-Crick interaction with the G34N35N36 
and C34N35N36 tRNA species, respectively, or a wobble in- 
teraction otherwise (Figure IB). The observed coordinated 
changes in GC content and the tRNA repertoire maxi- 
mize the number of Watson-Crick interactions by match- 
ing the abundant codons with their fully complementary 
tRNA species. However, except for superwobbling, which 
was shown to be less efficient than Watson-Crick and wob- 
ble interactions, the literature is inconclusive regarding the 
efficiency differences of wobble and Watson-Crick inter- 
actions. Some works claim that standard wobble interac- 
tions are less efficient than Watson-Crick interactions (47- 
50) whereas others find similar efficiency (51,52), and some- 
times even increased efficiency (53,54) of wobble compared 
to Watson-Crick interactions. It was shown that, at least 
in the case of U34N35N36 tRNA species, efficiency depends 
on U34 modifications (49,54,55), which are unknown for 
the vast majority of organisms. Furthermore, a clear pref- 
erence associated with the strength of translational selec- 
tion was shown for N1N2U3 over N1N2C3 codons in 4-fold 
degenerate amino acids (28), indicating selection towards 
wobble rather than Watson-Crick interaction. It is there- 
fore impossible to firmly conclude that Watson-Crick in- 
teractions indeed have an advantage over wobble interac- 
tions. However, assuming that Watson-Crick interactions 
are preferred, the much stronger association observed be- 
tween the tRNA repertoire and GC content compared to 
ENC'diff implies that Watson-Crick interactions are maxi- 
mized in organisms independent of the strength of transla- 
tional selection, and may suggest a basic translation opti- 
mization principle observed by all organisms. If this is true, 
reducing the number of tRNA species in order to improve 
translation efficiency should not counter the basic mecha- 
nism of maximizing Watson-Crick interactions. The ability 



to balance these requirements greatly depends on the GC 
content. When GC content is high, a minimal tRNA set 
that improves tRNA availability reduces Watson-Crick in- 
teractions (the abundant N1N2G3 and N1N2C3 codons are 
decoded using wobble rather than Watson-Crick interac- 
tions) and should therefore be discouraged. However, when 
GC content is low, the minimal tRNA set has little effect 
on Watson-Crick interactions (the abundant N1N2A3 and 
N1N2U3 codons continue to be decoded by Watson-Crick 
and wobble interactions, respectively) and is therefore pre- 
ferred. Indeed, it is of note that strong translational selec- 
tion manifested in high ENC'diff tends to occur in organ- 
isms with low GC content (data not shown) and may be 
responsible for the observed weak negative correlation be- 
tween these two traits (reported above). 

An alternative explanation for this phenomenon is that 
the poorly understood forces shaping nucleotide content 
(56,57) also influence the nucleotide content of the first an- 
ticodon position of tRNA genes. While several studies in 
both archaea (58) and bacteria (59,60) have shown that the 
nucleotide content of tRNA genes does not comply with the 
genomic GC content but is rather restricted by habitat tem- 
perature through constraints on folding stability, the wob- 
ble position, which does not contribute to structure stabil- 
ity, might be free from these constraints and conform to nu- 
cleotide content bias. 

Importantly, bacterial GC content does not explain all 
the variation in auxiliary tRNA usage. It is conceivable that 
in some cases tRNA genes are gained through mobile ele- 
ments such as plasmids and prophages and therefore fit the 
mobile element GC content rather than the GC content of 
the host bacteria. 



Following tRNA usage through evolution 

We explored the changes in tRNA usage throughout evo- 
lution by following the tRNA repertoire of a large dataset 
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of bacteria within their phylogenetic context. This power- 
ful approach revealed that auxiliary tRNAs are highly dy- 
namic genes that tend to be lost at a high rate but also have a 
substantial gain probability. This analysis also provided ev- 
idence of complicated dependencies between tRNA species, 
hinting towards their ordered acquisition. It is, however, not 
clear what underlies this order. There is no trivial connec- 
tion between this order and codon-anticodon interaction, 
as tRNAs with a very similar interaction potential, such as 
C34A 35 G36-Leu and C34A 35 C36-Val or C34G3 5 G36-Pro and 
C34G35C36-Ala, are not similarly recruited. While it is pos- 
sible that the order of tRNA recruitment in the various 
genomes is linked to the abundance of the codons or amino 
acids the tRNAs decode, which might affect the strength 
of selection towards their use, such an association was not 
identified (data not shown). 

Due to specific interests of the scientific community 
and difficulties in maintaining the organism cultures pre- 
viously required for sequencing, genome sequencing has 
been highly biased towards certain clades. This resulted in 
a few dense clades but also many sparse clades, where the 
tRNA repertoire has changed considerably, but the lack of 
bifurcations makes it impossible to understand the order of 
events. Adding more organisms belonging to the genera al- 
ready represented is not likely to be helpful since organisms 
in the same genus usually have similar tRNA repertoires. 
However, it is possible that the rapidly growing coverage of 
the prokaryotic world, through culture independent and in- 
creasingly cheaper genome sequencing, will provide data to 
better resolve the order of tRNA gain and loss. 

Despite the current data limitation, one pair of tRNA 
species demonstrated a striking dependency in their pat- 
tern of appearance. C34U35C36-GIU and C34U35G36-Gln are 
either present or absent together in approximately 90% 
of the organisms. They provide the most extreme exam- 
ple of co-gain since out of 14 and 17 gain events found 
for C34U35C36-GIU and C34U35G36-Gln, respectively, 10 of 
them co-occurred (Supplementary Figure S6). Also, while 
less statistically significant, they are co-lost together more 
than expected by chance. We also show evidence that these 
two tRNA species tend to be encoded in the same operon. 
We originally hypothesized that arranging auxiliary tR- 
NAs in operons may be an evolutionary mechanism to fa- 
cilitate coordinated changes in their genomic occurrence. 
However, since such arrangement was only observed for 
C34U35C36-Glu/C34U35G36-Gln, it does not appear that 
physical proximity is required in order to induce coordi- 
nated change of multiple tRNA species. In most bacte- 
ria, the glutamine-specific tRNA Gin is first aminoacy- 
lated with glutamate, which is converted to glutamine in 
an amidotransferase reaction requiring adenosine triphos- 
phate (61,62). The frequent clustering of C34U35C36-GIU 
and C34U35G36-Gln in operons may therefore promote an 
efficient regulation by their shared amino acid. Inciden- 
tally, the ratio of C34U35C36-GIU and C 3 4U35G 3 6-Gln tRNA 
molecules effectively participating in translation is likely to 
be lower than of tRNAs of other amino acids since Glu- 
tRNA is also a precursor of tetrapyrrole pigments (e.g. 
chlorophylls and hemes) (63-65) and Gln-tRNA amino 
acid loading is slower due to the additional step of Glu 
to Gin conversion. It is possible that this reduced yield 



decreases translational efficiency and is therefore compen- 
sated by careful fine-tuning of the tRNA repertoire. 

CONCLUSIONS 

The tRNA repertoire can have a tremendous impact on or- 
ganism fitness. This was mostly shown through the associ- 
ation of tRNA gene copy number and codon usage in rela- 
tion to the strength of translational selection. However, the 
evolution and possible effects of the types of tRNA species 
employed have remained mainly unexplored. We were able 
to show a weak negative association between the repertoire 
of auxiliary tRNA species and translational selection and a 
stronger association with nucleotide content. While the for- 
mer weak association might indicate an adaptive strategy 
that promotes translational efficiency, it is unclear whether 
the latter association is adaptive or neutral. Additionally, by 
incorporating phylogenetic data, changes in the evolution 
of the auxiliary tRNA repertoire were traced. This revealed 
the dynamic nature of auxiliary tRNA species and points to 
the ease in which such genes can be acquired and lost. We 
believe that the approach we describe for phylogeny-driven 
analysis of the auxiliary tRNA repertoire coupled with the 
continuously growing number of sequenced genomes will 
provide a better resolution of the dynamics of the tRNA 
repertoire and determine whether tRNA recruitment is or- 
dered, and if so, which properties underlie this order. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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